22 research outputs found

    An End-to-end Neural Natural Language Interface for Databases

    Full text link
    The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide non-technical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable non-expert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned auto-completion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries

    Towards Interactive Summarization of Large Document Collections

    No full text

    Automated Ontology Refinement Using Compression-Based Learning

    No full text
    In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations. For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others. The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies. Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection

    Netted?! How to Improve the Usefulness of Spider & Co.

    No full text
    Natural language interfaces for databases (NLIDBs) are an intuitive way to access and explore structured data. That makes challenges like Spider (Yale’s semantic parsing and text-to-SQL challenge) valuable, as they produce a series of approaches for NL-to-SQL-translation. However, the resulting contributions leave something to be desired. In this paper, we analyze the usefulness of those submissions to the leaderboard for future research. We also present a prototypical implementation called UniverSQL that makes these approaches easier to use in information access systems. We hope that this lowered barrier encourages (future) participants of these challenges to add support for actual usage of their submissions. Finally, we discuss what could be done to improve future benchmarks and shared tasks for (not only) NLIDBs

    DBPal: a Novel lightweight NL2SQL training pipeline

    No full text
    NATURAL LANGUAGE (NL) IS A PROMISING ALTERNATIVE INTERFACE TO DATABASE MANAGEMENT SYSTEMS (DBMSs) BECAUSE IT ENABLES NON-TECHNICAL USERS TO FORMULATE COMPLEX QUESTIONS. RECENTLY, DEEP LEARNING HAS GAINED TRACTION FOR TRANSLATING NATURAL LANGUAGE TO SQL. HOWEVER, THE CORE PROBLEM WITH EXISTING DEEP LEARNING APPROACHES IS THAT THEY REQUIRE AN ENORMOUS AMOUNT OF MANUALLY CURATED TRAINING DATA IN ORDER TO PROVIDE ACCURATE TRANSLATIONS. WE PRESENT DBPAL THAT USES A NOVEL TRAINING PIPELINE TO LEARN NL2SQL INTERFACES WHICH SYNTHESIZES TRAINING DATA AND, THUS, DOES NOT RELY ON MANUALLY CURATED TRAINING DATA

    Interactive Summarization of Large Document Collections

    No full text
    We present a new system for custom summarizations of large text corpora at interactive speed. The task of producing textual summaries is an important step to understand large collections of topicrelated documents and has many real-world applications in journalism, medicine, and many more. Key to our system is that the summarization model is refined by user feedback and called multiple times to improve the quality of the summarization iteratively. To that end, the human is brought into the loop to gather feedback in every iteration about which aspects of the intermediate summaries satisfy their individual information needs. Our system consists of a sampling component and a learned model to produce a textual summary. As we show in our evaluation, our system can provide a similar quality level as existing summarization models that are working on the full corpus and hence cannot provide interactive speeds

    Towards Robust and Transparent Natural Language Interfaces for Databases

    No full text
    In recent years the field of research on natural language interfaces for databases (NLIDBs) has progressed considerably, which can be seen by the results of challenges like Spider. However, most of these approaches concentrate on delivering best-guess answers and improving (computational) accuracy. Yet, there are still a lot of open issues regarding robustness, confidence, and transparency. Therefore, this vision paper starts to point to relevant milestones as well as corresponding opportunities for addressing them, opening up a potential path of the future development of NLIDBs
    corecore